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(54) Text searching system 

(57) The invention relates to a text searching sys- 
tem (30,60) for searching web pages according to key- 
word and classification data (46,64) provided by a user. 
The text searching system (30,60) comprises a compu- 
ter having a memory (32) for storing programs and data 
and a processor (34) for executing the programs stored 
in the memory (32), a text data file (36) stored in the 
memory (32) having text data (38) of web pages of a 
plurality of world wide web sites, a text index file (40) 
stored in the memory (32) having keyword searching 
data (42) for searching keywords contained in the text 
data (38) of each of the web pages of the text data file 
(36), a classification index file (44,62) stored in the 
memory (32) having classification data (46,64) corre- 
sponding to the classification (54) of each of the web 
pages of the text data file (36), and a searching program 
(48,66) stored in the computer for searching the text 
index file (40) and the classification index file (44,62) 
according to keyword and classification data (46,64) 
provided by a user so as to find text data (38) which are 
matched with the user provided keyword data and con- 
tained in a plurality of target web pages whose classifi- 
cations (54) are matched with the user provided 
classification data (46,64) in the text data file (36). 
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Description 

[0001] The invention relates to a text searching sys- 
tem according to the pre-characterizlng portion of claim 
1. 

[0002] As the number of web pages on the internet 
increases, a searching system becomes necessary for 
searching the myriad of web pages for specific informa- 
tion. A corresponding prior art searching system com- 
prises a computer having a memory in which a text data 
file, a text index file comprising keyword searching data, 
and a searching program are stored. Since the system 
uses a keyword for searching web pages, the text data 
of all the web pages containing the keyword are 
returned. Whereas this requires an excessive amount of 
transmission time, most of the transmitted web pages 
do not fit well into the classification provided by the user. 
Therefore, additional search and transmission time 
must be spent. For example, if the user wants to search 
for web pages of movies containing references to tor- 
nado', the searching system will transmit to the user the 
text data of ail web pages containing the word tor- 
nado". However, these transmitted web pages will 
include irrelevant pages concerning unrelated topics 
such as meteorology, history and news. Therefore, 
more time must be spent manually selecting the pages 
that are actually pertinent. 

[0003] With these problems in mind, the present 
invention aims at providing a text searching system for 
searching web pages according to a keyword which, 
nevertheless, is more economic either for transmission 
time and consecutive manual selection. 
[0004] This is achieved by the present invention as 
claimed in claim 1 . The dependent claims define advan- 
tageous further developments of the respective inven- 
tion. 

[0005] In that, according to the invention, the sys- 
tem additionally includes a classification index file hav- 
ing classification data, and a searching program for 
searching text data matching with user provided key- 
word data and user provided classification data, the 
search can be performed in a much more defined man- 
ner, thus to avoid outputting of misrelated pages. 
[0006] In the following the invention Is described in 
more detail, having reference to the accompanying 
drawings, in which 

Fig. 1 is a functional block diagram of a prior art 
searching system as mentioned above, 
Fig. 2 is a perspective diagram of the keyword 
searching data in the system of Fig. 1, 
Fig. 3 is a functional block diagram of a text search- 
ing system according to the present invention, and 
Fig. 4 is a perspective diagram of another text 
searching system according to the present inven- 
tion. 

[0007] The prior art text searching system 10 



shown in Fig. 1 comprises a computer (not shown), a 
text data file 1 6, a text index file 20, and a searching pro- 
gram 24. The computer comprises a memory 12 for 
storing programs and data and a processor 14 for exe- 

s cuting the programs stored in the memory 12. The text 
data file 1 6, text index file 20, and searching program 24 
are stored in the memory 12. The text data file 16 has 
text data 18 of web pages of a plurality of world wide 
web sites. The text index file 20 has keyword searching 

w data 22 for searching keywords contained in the text 
data 1 8 of each of the web pages of the text data file 1 6. 
The searching program 24 is used for searching the text 
index file 20 according to keyword data provided by a 
user so as to find text data 18 of all the web pages hav- 

15 ing the user provided keyword data in the text data file 
16. 

[0008] As can be seen in Fig. 2, the keyword 
searching data 22 of the text index file 20 are built 
according to the text data 18 of the text data file 16. 

20 Each set of keyword searching data 22 has a keyword 
21 and address data 23 of the keyword 21 fn all web 
pages. As shown in Fig. 2, the address data of the key- 
word 'world" in all web pages are al, a2, a3...; the 
address data of the keyword "world wide web" in ail web 

25 pages are c1 , c2, c3.... When the user inputs a keyword, 
the searching program 24 searches the text index file 20 
according to the keyword provided by the user to find 
the keyword searching data 22 corresponding to the 
keyword for getting the address data of the keyword in 

30 all web pages. Finally, the text data file 16 is used for 
transmitting to the user the text data 18 of aft web pages 
having the keyword. 

[Q009] As mentioned before, because the prior art 
searching system 1 0 uses a keyword for searching web 

as pages, the text data of all web pages containing the key- 
word are returned. This takes an excessive amount of 
time to transmit. In searching for the web pages within a 
specific classification, the searching system 10 trans- 
mits the text data of all the web pages containing the 

40 keyword to the user but most of the transmtted web 
pages are not well matched with the user provided clas- 
sification. Therefore, more search and transmission 
time must be spent and, nevertheless, finally the pages 
actually pertinent have to be selected manually. 

45 [0010] A text searching system 30 aocordtog to the 
present invention, as this Is shown in Fig. 3, comprises 
a computer (not shown), a text data file 36, a text index 
file 40, a classification index file 44, and a searching 
program 48. The computer comprises a memory 32 for 

so storing programs and data and a processor 34 for exe- 
cuting the programs stored in the memory 32. The text 
data file 36, text index file 40, classification index file 44 
and searching program 48 are stored in the memory 32. 
The text data file 36 has text data 38 of web pages of a 

55 plurality of world wide web sites. The text index file 40 
has keyword searching data 42 for searching keywords 
contained in the text data 38 of each of the web pages 
of the text data file 36. The classification index file 44 
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has classification data 46 corresponding to the classifi- 
cation of each of the web pages of the text data file 36. 
The searching program 46 is used for searching the text 
index file 40 and the classification index file 44 accord- 
ing to keyword and classification data provided by a 
user so as to find text data 38 which are matched with 
the user provided keyword data and contained in a plu- 
rality of target web pages whose classifications are 
matched with the user provided classification data in the 
text data file 36. 

[0011] The keyword searching data 42 of the text 
index file 40 is built according to the text data 38 of the 
text data file 36. Each keyword searching data 42 has a 
keyword and address data of the keyword in all web 
pages. Each classification data 46 of the classification 
index file 44 has a plurality of classifications 54, and 
each classification 54 has web page data 50 of all the 
web pages belonging to the classification. Each web 
page data 50 comprises a keyword position indexing 
data 52 of the web page. The keyword position indexing 
data 52 is used for pointing to the positions of the key- 
word searching data 42 of the specific web page con- 
tained in the text index file 40. 

[0012] When a user inputs keyword and classifica- 
tion data, the searching program 48 searches the clas- 
sification index file 44 according to the classification 
data provided to find the web page data 50 of all web 
pages belonging to the classification data. Then, the 
searching program 48 searches the position of the key- 
word searching data 42 of the text data 38 of each web 
page in the text index file 40 according to the keyword 
position indexing data 52 of the web page data 50. 
Then, the searching program 48 searches the keyword 
searching data 42 of all web pages belonging to the 
classification data in the text index file 40 according to 
the keyword provided by the user to find the text data 38 
of all web pages which belong to the classification data 
and have the keyword. Finally, the text data file 36 is 
used for transmitting the text data 38 of all web pages 
belonging to the classification data and having the key- 
word to the user. 

[0013] Fig.4 is a perspective diagram of another 
text searching system 60 according to the present 
invention. The classification index file 62 of the text 
searching system 60 contains the classification data 64 
of the web pages of each keyword searching data 42 in 
the text index file 40. When a user inputs keyword and 
classification data, the searching program 66 searches 
the text index file 40 according to the keyword provided 
to find all the keyword searching data 42 matched with 
the user provided keyword data and the address data of 
the keyword in all the web pages. Then, the searching 
program 66 searches the classification index file 62 
according to the keyword searching data 42 to find the 
classification data 64 of the web page of each matched 
keyword searching data 42. The searching program 66 
finds all keyword searching data 42 belonging to the 
classification data according to the classification data 



provided by the user to find the text data 36 of all web 
pages which belong to the classification data and have 
the keyword. Finally, the text data file 36 is used for 
transmitting the text data 38 of all web pages belonging 
5 to the classification data and having the keyword to the 
user. 

[0014] The text searching system 30 uses the clas- 
sification index file 44 to find all web pages belonging to 
the classification data provided by the user, and then 

w uses the text index file 40 and the keyword provided by 
the user to find all the web pages belonging to the clas- 
sification data and having the keyword. The text search- 
ing system 60 uses the text index file 40 to find all web 
pages having the keyword provided by the user, and 

is then uses the classification index file 62 and the classi- 
fication data provided by the user to find all the web 
pages belonging to the classification data and having 
the keyword. 

[0015] Compared with the prior art searching sys- 
20 tern 10, the text searching systems 30, 60 according to 
the present Invention use keyword and classification 
data provided by the user and finds all the web pages 
that belong to the classification data and have the key- 
word. The text searching systems 30, 60 transmit only 
25 the text data of all the web pages belonging to the clas- 
sification data and having the keyword to the user. 
Therefore, the searching and transmission time is 
greatly reduced and the text searching system is more 
efficient 

30 

Claims 

1 . A text searching system (30, 60) comprising: 

35 a computer having a memory (32) for storing 

programs and data and a processor (34) for 
executing the programs stored in the memory 
(32); 

a text data file (36) stored in the memory (32) 
40 having text data (38) of web pages of a plurality 

of world wide web sites; and 
a text Index file (40) stored in the memory (32) 
having keyword searching data (42) for search- 
ing keywords contained in the text data (38) of 
45 each of the web pages of the text data file (36); 

characterized In that: 

the text searching system (30,60) further com- 
prises: 

a classification index file (44,62) stored in the 
so memory (32) having classification data (46,64) 

corresponding to the classification (54) of each 
of the web pages of the text data file (36); and 
a searching program (48,66) stored in the com- 
puter for searching the text index file (40) and 
55 the classification index file (44,62) according to 

keyword and classification data (46,64) pro- 
vided by a user so as to find text data (38) 
which are matched with the user provided key- 



3 



5 



EP 1 056 024 A1 



word data and contained in a plurality of target 
web pages whose classifications (54) are 
matched with the user provided classification 
data (46,64) in the text data file (36). 

5 

2. The text searching system (30) of claim 1 wherein 
the classification index file (44) contains a plurality 
of classifications (54) and web page data (50) of all 
the web pages belonging to each of the classifica- 
tions (54), and wherein the searching program (46) w 
searches the classification index file (44) to find all 

the target web pages whose classifications (54) are 
matched the user provided classification data (46), 
and then searches the text index file (40) to find text 
data (38) which are matched with the user provided 15 
keyword data and contained in the target web 
pages of the text data file (36). 

3. The text searching system (30) of claim 2 wherein 

the web page data (50) of each specific web page 20 
in the classification index file (44) contain keyword 
position indexing data (52) for pointing the positions 
of the keyword searching data (42) of the specific 
web page contained in the text Index file (40), and 
wherein the searching program (46) searches the 25 
classification index file (44) to find the positions of 
the keyword searching data (42) of the target web 
pages in the text index file (40), and then searches 
the keyword searching data (42) of the target web 
pages to find the text data (38) which are matched 30 
with the user provided keyword data and contained 
in the target web pages of the text data file (36). 

4. The text searching system (60) of claim 1 wherein 

the classification index file (62) contains the classi- 3s 
fication of the web page of each keyword searching 
data (42) in the text index file (40), and wherein the 
searching program (66) searches the text index file 
(40) to find all the keyword searching data (42) 
matched with the user provided keyword data, and 40 
then searches the classification index file (62) to 
find the classification of the web page of each 
matched keyword searching data (42) so as to 
locate the keyword searching data (42) of the target 
web pages, and finally finds the text data (38) con- 45 
tained in the text data file (36) using the keyword 
searching data (42) of the target web pages. 
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